Optimizing Performance of Hadoop with Parameter Tuning
نویسندگان
چکیده
منابع مشابه
Hadoop Performance Tuning - A Pragmatic & Iterative Approach
Hadoop represents a Java-based distributed computing framework that is designed to support applications that are implemented via the MapReduce programming model. In general, workload dependent Hadoop performance optimization efforts have to focus on 3 major categories. Namely the systems HW, the systems SW, and the configuration and tuning/optimization of the Hadoop infrastructure components. F...
متن کاملOptimizing Hadoop* Deployments
Intel is a major contributor to open source initiatives, such as Linux*, Apache*, and Xen*, and has also devoted resources to Hadoop analysis, testing, and performance characterizations, both internally and with fellow travelers such as HP and Cloudera. Through these technical efforts, Intel has observed many practical trade-offs in hardware, software, and system settings that have real-world i...
متن کاملTowards Optimizing Hadoop Provisioning in the Cloud
Data analytics is becoming increasingly prominent in a variety of application areas ranging from extracting business intelligence to processing data from scientific studies. MapReduce programming paradigm lends itself well to these data-intensive analytics jobs, given its ability to scale-out and leverage several machines to parallely process data. In this work we argue that such MapReduce-base...
متن کاملOptimizing Sort in Hadoop Using Replacement Selection
This paper presents and evaluates an alternative sorting component for Hadoop based on the replacement selection algorithm. In comparison with the default quicksort-based implementation, replacement selection generates runs which are in average twice as large. This makes the merge phase more efficient, since the amount of data that can be merged in one pass increases in average by a factor of t...
متن کاملPiranha: Optimizing Short Jobs in Hadoop
Cluster computing has emerged as a key parallel processing platform for large scale data. All major internet companies use it as their major central processing platform. One of cluster computing’s most popular examples is MapReduce and its open source implementation Hadoop. These systems were originally designed for batch and massive-scale computations. Interestingly, over time their production...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ITM Web of Conferences
سال: 2017
ISSN: 2271-2097
DOI: 10.1051/itmconf/20171203040